Acknowledgements
This data was gathered by Jake Daniels. It covers data collected on the SEO tag between 2018-01-01 and 2018-05-31 from Medium.com. This includes: the title, date of publication, claps generated, author, reading time, and the url.
There’s a total of 8396 articles in this dataset.
Here’s a look at what that looks like:
Time-series
Here’s what the data collected looks over time.
Weekly/Daily Posts
Daily and weekly totals are shown– along with a trend line. We’ll look at some trending words per week after.
Day-Of-Week Frequency
We can summarize articles by weekday. Later we can examine their effectiveness.

Weekly
By using term-frequency-in-document-frequency or TF-IDF, we can find the most relevant word of each week.

If you’re curious, here’s chart of the three most relevant words per week. You can look for the highest clap averages and see what topics were important that week.
I recommend sorting by avg_claps_that_week (click the table to sort) and see what relevant words were likely responsibile for increased claps.
Frequent Terms
Here are words and phrases that are most used in headlines. The phrases have been stemmed to best gather their relations.


Word Networks
Here is a network of the correlated words in the article headers.

We can look at those groupings and see clusters of topics
If we add another dimension, then we can see which of the networks are most effective for generating claps.


Need help reading the charts?
Each connected node is part of a topic. We depend on the colours to distinguish which are popular or unpopular.
Positive Trends
- Red is good, especially when it’s a larger node
- Networks with red in them represent topics that are popular
- Small blue nodes may have topics that have yet to be packaged correctly
- Small red nodes can represent under-utilized topics
Negative Trends
- Blue is ineffective, especialy when it’s a larger node
- White is neutral, these words/topics are performing at an average rate.
Topic Clusters
The networks above show relationships between words that create topics.
We’ll look more at giving our computer the chance to create topics by looking for clusters of words. We use unsupervised machine-learning to find clusters of words that are frequently used together. These topics can typically be inferenced. It’s a good way to get a feel for big trends currently underway in the industry.
Here’s 5 clusters that hold 8 words to describe the topic:

This creates our topic clusters! Great for brainstorming content and knowing what’s commonly talked about. I’ll include a peek at what is going on under the hood.
Tweaking the numbers forms different topics. 
Word Impact
Now we have some interactivity.
Here are words that are impactful/overused. And words that are proven to bring claps.
The size is based on another measurement called geometric mean. It is often used when data is highly skewed. It can be a tie-breaker for clusters that are close together. 
Curious what the other words are? Interact with this graph:
And here’s a table of those terms ranked by geometric mean:
Top 25 Terms
|
|
Performance
|
|
Word
|
Geometric Average
|
Occurences
|
|
write
|
1.27
|
125
|
|
tool
|
1.24
|
158
|
|
content
|
1.11
|
315
|
|
expert
|
1.06
|
126
|
|
googl
|
1.04
|
526
|
|
blog
|
1.00
|
287
|
|
post
|
0.92
|
108
|
|
site
|
0.91
|
252
|
|
backlink
|
0.90
|
115
|
|
free
|
0.90
|
112
|
|
step
|
0.88
|
145
|
|
trend
|
0.78
|
107
|
|
lead
|
0.76
|
91
|
|
result
|
0.70
|
94
|
|
organ
|
0.69
|
100
|
|
search
|
0.68
|
593
|
|
rank
|
0.63
|
327
|
|
creat
|
0.62
|
90
|
|
guid
|
0.62
|
174
|
|
top
|
0.62
|
371
|
|
effect
|
0.60
|
131
|
|
page
|
0.60
|
237
|
|
tip
|
0.60
|
367
|
|
increas
|
0.58
|
137
|
|
list
|
0.56
|
115
|
Viral Words
Let’s subset popular articles from ones that are average and see if there’s a difference in word usage.
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"

Let’s put them on the same graphs and look at the difference in proportion usage.
Word Choice in Viral Posts vs Avg. Post () 
Take a look at the viral phrases, * if diff is positive, it is more viral * if diff is negative, it is more average * if diff is close to 0, it is found commonly in both.
|
Viral Words
|
|
word
|
diff
|
|
seo strategi
|
0.0028526
|
|
googl analyt
|
0.0027334
|
|
content market
|
0.0026798
|
|
search engin
|
0.0019226
|
|
user experi
|
0.0014994
|
|
seo tool
|
0.0013533
|
|
voic search
|
0.0013533
|
|
engin optim
|
0.0013265
|
|
guest post
|
0.0010880
|
|
seo checklist
|
0.0010880
|
|
seo expert
|
0.0010612
|
|
app store
|
0.0006766
|
|
busi onlin
|
0.0006766
|
|
domain author
|
0.0006766
|
|
engin market
|
0.0006766
|
|
featur snippet
|
0.0006766
|
|
free seo
|
0.0006766
|
|
imag search
|
0.0006766
|
|
market servic
|
0.0006766
|
|
onlin consult
|
0.0006766
|
|
page rank
|
0.0006766
|
|
googl search
|
0.0006498
|
|
meta descript
|
0.0005306
|
|
search result
|
0.0005306
|
|
seo friend
|
0.0005306
|
|
seo mistak
|
0.0005306
|
|
market compani
|
0.0004769
|
|
一定 要
|
0.0002653
|
|
black hat
|
0.0002653
|
|
blog site
|
0.0002653
|
|
busi seo
|
0.0002653
|
|
free tool
|
0.0002653
|
|
hat seo
|
0.0002653
|
|
increas organ
|
0.0002653
|
|
manag servic
|
0.0002653
|
|
market campaign
|
0.0002653
|
|
market tip
|
0.0002653
|
|
market trend
|
0.0002653
|
|
quick guid
|
0.0002653
|
|
seo audit
|
0.0002653
|
|
seo copywrit
|
0.0002653
|
|
seo plugin
|
0.0002653
|
|
seo search
|
0.0002653
|
|
media market
|
0.0001192
|
|
meta tag
|
0.0001192
|
|
page speed
|
0.0001192
|
|
qualiti backlink
|
0.0001192
|
|
seo hack
|
0.0001192
|
|
websit traffic
|
0.0001192
|
|
site list
|
-0.0000537
|
|
маркетинг в
|
-0.0001461
|
|
маркетинг для
|
-0.0001461
|
|
ต่อ seo
|
-0.0001461
|
|
เว็บไซต์ ของ
|
-0.0001461
|
|
action tip
|
-0.0001461
|
|
basic seo
|
-0.0001461
|
|
build qualiti
|
-0.0001461
|
|
build strategi
|
-0.0001461
|
|
busi blog
|
-0.0001461
|
|
busi directori
|
-0.0001461
|
|
busi growth
|
-0.0001461
|
|
common mistak
|
-0.0001461
|
|
compani offer
|
-0.0001461
|
|
design develop
|
-0.0001461
|
|
digit strategi
|
-0.0001461
|
|
ecommerc site
|
-0.0001461
|
|
effect seo
|
-0.0001461
|
|
email market
|
-0.0001461
|
|
essenti seo
|
-0.0001461
|
|
estat market
|
-0.0001461
|
|
evergreen content
|
-0.0001461
|
|
feel lucki
|
-0.0001461
|
|
frog seo
|
-0.0001461
|
|
generat idea
|
-0.0001461
|
|
generat lead
|
-0.0001461
|
|
golden rule
|
-0.0001461
|
|
googl confirm
|
-0.0001461
|
|
googl job
|
-0.0001461
|
|
googl seo
|
-0.0001461
|
|
googl trend
|
-0.0001461
|
|
googl voic
|
-0.0001461
|
|
increas traffic
|
-0.0001461
|
|
it’ time
|
-0.0001461
|
|
latent semant
|
-0.0001461
|
|
learn tool
|
-0.0001461
|
|
load speed
|
-0.0001461
|
|
love seo
|
-0.0001461
|
|
market de
|
-0.0001461
|
|
market expert
|
-0.0001461
|
|
market solut
|
-0.0001461
|
|
media manag
|
-0.0001461
|
|
nowfloat boost
|
-0.0001461
|
|
onlin store
|
-0.0001461
|
|
optim content
|
-0.0001461
|
|
page optim
|
-0.0001461
|
|
pbn backlink
|
-0.0001461
|
|
post checklist
|
-0.0001461
|
|
profession digit
|
-0.0001461
|
|
python panda
|
-0.0001461
|
|
qualiti content
|
-0.0001461
|
|
react spa
|
-0.0001461
|
|
reput manag
|
-0.0001461
|
|
scream frog
|
-0.0001461
|
|
search market
|
-0.0001461
|
|
semant index
|
-0.0001461
|
|
seo blog
|
-0.0001461
|
|
seo de
|
-0.0001461
|
|
seo market
|
-0.0001461
|
|
seo perform
|
-0.0001461
|
|
seo spider
|
-0.0001461
|
|
simpl seo
|
-0.0001461
|
|
site search
|
-0.0001461
|
|
site speed
|
-0.0001461
|
|
site web
|
-0.0001461
|
|
sitio web
|
-0.0001461
|
|
social bookmark
|
-0.0001461
|
|
speed updat
|
-0.0001461
|
|
step seo
|
-0.0001461
|
|
structur data
|
-0.0001461
|
|
sunday talk
|
-0.0001461
|
|
top digit
|
-0.0001461
|
|
top free
|
-0.0001461
|
|
top search
|
-0.0001461
|
|
tu sitio
|
-0.0001461
|
|
você precisa
|
-0.0001461
|
|
wordpress blog
|
-0.0001461
|
|
yoast seo
|
-0.0001461
|
|
googl map
|
-0.0002921
|
|
googl rank
|
-0.0002921
|
|
growth hack
|
-0.0002921
|
|
market digit
|
-0.0002921
|
|
servic provid
|
-0.0002921
|
|
submiss site
|
-0.0002921
|
|
de seo
|
-0.0004382
|
|
drive traffic
|
-0.0004382
|
|
market agenc
|
-0.0004382
|
|
search consol
|
-0.0004382
|
|
top seo
|
-0.0004382
|
|
крауд маркетинг
|
-0.0005842
|
|
lead generat
|
-0.0005842
|
|
blog post
|
-0.0006111
|
|
seo tip
|
-0.0006111
|
|
boost seo
|
-0.0007035
|
|
brand awar
|
-0.0007035
|
|
content strategi
|
-0.0007035
|
|
content writer
|
-0.0007035
|
|
de conteúdo
|
-0.0007035
|
|
design servic
|
-0.0007035
|
|
easi step
|
-0.0007035
|
|
googl updat
|
-0.0007035
|
|
improv seo
|
-0.0007035
|
|
inbound market
|
-0.0007035
|
|
increas websit
|
-0.0007035
|
|
rank factor
|
-0.0007035
|
|
search rank
|
-0.0007035
|
|
seo en
|
-0.0007035
|
|
seo smm
|
-0.0007035
|
|
seo task
|
-0.0007035
|
|
top posit
|
-0.0007035
|
|
web traffic
|
-0.0007035
|
|
wordpress plugin
|
-0.0007035
|
|
market tool
|
-0.0008495
|
|
organ traffic
|
-0.0008495
|
|
seo agenc
|
-0.0008495
|
|
websit design
|
-0.0008495
|
|
websit rank
|
-0.0008495
|
|
market strategi
|
-0.0009032
|
|
seo consult
|
-0.0009956
|
|
beginn guid
|
-0.0012609
|
|
busi websit
|
-0.0012609
|
|
commerc websit
|
-0.0012609
|
|
content optim
|
-0.0012609
|
|
content write
|
-0.0012609
|
|
graphic design
|
-0.0012609
|
|
local busi
|
-0.0012609
|
|
onlin reput
|
-0.0012609
|
|
real estat
|
-0.0012609
|
|
web tasarım
|
-0.0012609
|
|
busi owner
|
-0.0014069
|
|
design compani
|
-0.0014069
|
|
link build
|
-0.0015530
|
|
onlin market
|
-0.0015530
|
|
seo trend
|
-0.0016991
|
|
local seo
|
-0.0018451
|
|
seo rank
|
-0.0019644
|
|
onlin busi
|
-0.0021104
|
|
seo compani
|
-0.0029331
|
|
web develop
|
-0.0029331
|
|
seo techniqu
|
-0.0040480
|
|
internet market
|
-0.0041940
|
|
page seo
|
-0.0054549
|
|
seo servic
|
-0.0056010
|
|
social media
|
-0.0059199
|
|
web design
|
-0.0071540
|
|
digit market
|
-0.0086563
|
Let’s Combine them!
